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Elizabeth  S.  Bentley,  Lisimachos  P.  Kondi,  Member,  IEEE,  John  D.  Matyjas, 

Michael  J.  Medley,  Senior  Member,  IEEE,  and  Bruce  W.  Suter 


Abstract — In  this  paper,  we  propose  an  approach  to  manage 
network  resources  for  a  direct  sequence  code  division  multiple 
access  (DS-CDMA)  visual  sensor  network  where  nodes  monitor 
scenes  with  varying  levels  of  motion.  It  uses  cross-layer  optimiza¬ 
tion  across  the  physical  layer,  the  link  layer,  and  the  application 
layer.  Our  technique  simultaneously  assigns  a  source  coding  rate, 
a  channel  coding  rate,  and  a  power  level  to  all  nodes  in  the  network 
based  on  one  of  two  criteria  that  maximize  the  quality  of  video  of 
the  entire  network  as  a  whole,  subject  to  a  constraint  on  the  total 
chip  rate.  One  criterion  results  in  the  minimal  average  end-to-end 
distortion  amongst  all  nodes,  while  the  other  criterion  minimizes 
the  maximum  distortion  of  the  network.  Our  experimental  results 
demonstrate  the  effectiveness  of  the  cross-layer  optimization. 

Index  Terms — Code  division  multiple  access  (CDMA),  convolu¬ 
tional  codes,  cross-layer,  H.264,  joint  source-channel  coding,  multi- 
media  communications,  power  control,  resource  allocation,  spread 
spectrum,  visual  sensor  network. 


I.  Introduction 


IN  this  paper,  we  consider  a  direct  sequence  code  division 
multiple  access  (DS-CDMA)  visual  sensor  network  where 
we  assume  that  the  nodes  in  the  network  are  equipped  with  a 
video  camera  deployed  to  survey  a  large  area.  Many  sensor  net¬ 
works  concern  themselves  with  increasing  the  energy  efficiency 
and  maximizing  the  lifetime  of  the  network  as  in  [1] — [3].  Some 
visual  sensor  networks  focus  on  image  transmission  as  in  [4], 
where  the  trade-off  between  image  quality  and  energy  consump¬ 
tion  of  different  routing  paths  is  considered.  Visual  sensor  net¬ 
works  that  transmit  video  are  much  more  challenging  than  typ¬ 
ical  sensor  networks  due  to  the  high  bit  rates  and  delay  con¬ 
straints.  In  [5],  the  demanding  nature  of  visual  sensor  networks 
is  acknowledged  and  a  new  wireless  sensor  node  protocol  stack 
is  proposed.  However,  the  improvements  that  can  be  gained  with 
cross-layer  interactions  are  not  considered. 
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In  our  set-up,  some  nodes  will  be  imaging  a  relatively  sta¬ 
tionary  field  while  other  nodes  will  be  imaging  scenes  with  a 
high  level  of  motion  to  create  a  more  realistic  scenario  where 
scenes  with  varying  levels  of  motion  exist.  Video  sequences 
with  less  motion  can  be  source  encoded  at  a  lower  bit  rate  while 
still  yielding  good  picture  quality.  The  centralized  control  unit 
at  the  network  layer  should  be  able  to  request  that  the  video  at 
specific  nodes  be  transmitted  at  a  lower  bit  rate,  if  it  is  deemed  as 
being  capable  of  still  producing  adequate  video  quality.  These 
nodes  that  compress  their  video  at  a  lower  bit  rate  are  left  with 
more  bits  for  channel  coding  and  can  afford  to  transmit  at  a 
lower  power  so  that  they  will  cause  less  interference  to  the  other 
nodes.  For  this  reason,  DS-CDMA  is  an  appropriate  choice  for 
use  in  our  visual  sensor  network  set-up. 

In  this  work,  we  present  a  multi-node  cross-layer  optimiza¬ 
tion  technique  that  operates  across  the  physical,  data  link,  and 
application  layers  of  the  system.  Our  algorithm  accounts  for  net¬ 
work  performances  all  the  way  from  the  physical  layer  up  to 
the  application  layer.  At  the  application  layer,  the  source  coding 
rate  for  video  compression  is  determined.  At  the  data  link  layer, 
the  channel  coding  rate  is  selected,  and  at  the  physical  layer, 
the  transmission  power  is  determined.  Our  algorithm  simulta¬ 
neously  allocates  a  source  coding  rate,  a  channel  coding  rate, 
and  a  power  level  to  all  nodes  in  a  DS-CDMA  visual  sensor 
network.  We  propose  to  jointly  optimize  all  nodes  using  one  of 
two  criteria.  Our  first  criterion  results  in  the  minimal  average 
end-to-end  distortion  over  all  the  nodes  in  the  network  while  our 
second  criterion  minimizes  the  maximum  distortion  amongst  all 
nodes.  This  optimization  algorithm  uses  universal  rate  distor¬ 
tion  characteristics  (URDC)  to  reduce  the  computational  com¬ 
plexity.  Zero-mean  Gaussian  interference  is  assumed  to  obtain 
the  probability  of  error  in  a  channel  using  Viterbi’s  upper  bound 
on  the  probability  of  error.  Some  preliminary  results  appear  in 
[6],  and  an  earlier  version  of  this  article  was  published  in  [7]. 

The  rest  of  the  paper  is  organized  as  follows.  In  Section  II, 
we  describe  the  transmission  parameters  in  a  visual  sensor  net¬ 
work:  the  DS-CDMA  physical  layer  in  Section  II-A,  the  source 
coding  in  Section  II-B,  and  the  channel  coding  in  Section  II-C. 
In  Section  III,  the  cross-layer  optimization  algorithm  is  ex¬ 
plained.  In  Section  IV,  experimental  results  are  presented,  and 
in  Section  V,  conclusions  are  drawn. 

II.  Visual  Sensor  Networks 

Sensor  networks  previously  focused  on  networks  that 
transmit  scalar  information  such  as  temperature,  pressure. 
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acoustic  data,  etc.  Visual  sensor  networks  are  much  more  chal¬ 
lenging  due  to  the  high  bit  rates  and  delay  constraints  required 
for  video  transmission.  These  networks  are  comprised  of  typi¬ 
cally  low -weight  distributed  sensor  nodes  that  can  communicate 
directly  (not  via  intermediate  nodes)  with  a  centralized  control 
unit  at  the  network  layer.  The  centralized  control  unit  performs 
channel  and  source  decoding  to  obtain  the  received  video  from 
each  node.  The  control  unit  transmits  information  to  the  nodes 
in  order  to  request  changes  in  transmission  parameters,  such 
as  source  coding  rate,  channel  coding  rate,  and  transmission 
power.  Applications  of  visual  sensor  networks  include  surveil¬ 
lance,  automatic  tracking  and  signaling  of  intruders  within  a 
physical  area,  command  and  control  of  unmanned  vehicles,  and 
environmental  monitoring. 

A.  DS-CDMA 

This  work  considers  a  wireless  visual  sensor  network  that  uti¬ 
lizes  DS-CDMA.  In  DS-CDMA,  all  users  (nodes)  transmit  on 
the  same  frequency.  In  order  to  transmit  a  single  bit,  a  node  ac¬ 
tually  transmits  L  “chips”.  Thus,  each  node  k  is  associated  with 
a  spreading  code  (signature  sequence)  s&,  which  is  a  vector  of 
length  L.  Thus,  in  order  to  transmit  the  7th  bit  of  a  bit  stream, 
node  k  actually  transmits  bk(i) s*,,  which  is  a  vector  of  L  chips 
and  bk(i)  is  either  1  or  —1,  depending  on  the  value  of  the  bit 
that  is  being  transmitted. 

Assuming  there  are  K  nodes  in  a  synchronous  single-path 
binary  phase  shift  keying  (BPSK)  channel,  the  received  signal 
can  be  expressed  as  r(i)  =  ^4i6i(*)si  +  5Z^_2  Afcfefc(*)sfc  +  rifc, 
where  Ak,  bk(i),  s/;:,  rq  are  the  amplitude,  symbol  stream,  sig¬ 
nature  sequence,  and  noise  of  node  k,  respectively,  r (/) ,  s /,(*), 
and  ri/,.  are  vectors  of  length  L.  DS-CDMA  systems  are  usually 
interference-limited  systems.  Thus,  it  is  reasonable  to  ignore 
thermal  noise  and  background  noise  and  assume  that  the  inter¬ 
ference  can  be  approximated  by  a  zero-mean  White  Gaussian 
random  process  [8].  Since  user  k  has  an  associated  power  level 
in  Watts,  Sk  =  Ek.Rk ,  the  energy-per-bit  to  multiple-access-in¬ 
terference  (MAI)  ratio  becomes 


second  and  is  a  dimensionless  number,  /?/..  will  be  mea¬ 
sured  in  bits  per  second  [9]. 

Let  us  also  define  the  vectors  Rs  =  [Ra,i-,  Ra,2i  •  •  • ,  Rs,k]T , 
Rc  =  [Rc,i,Rc, 2,  •  •  • ,  Rc,k]t,  and  A  =  [Si,  S2, . . . ,  Sk]*. 

B.  Source  Coding 

In  our  visual  sensor  network,  we  assume  that  the  nodes  are 
equipped  with  video  cameras  that  monitor  various  fields.  We 
assume  that  each  node  has  the  computational  power  necessary 
for  video  compression.  The  video  captured  by  the  cameras 
is  compressed  using  the  H.264/AVC  video  coding  standard. 
H.264/AVC  has  two  conceptual  layers,  the  video  coding  layer 
(VCL)  and  the  network  abstraction  layer  (NAL).  The  VCL 
forms  the  main  pail  of  the  H.264/AVC  and  performs  the 
required  tasks  for  video  compression  to  efficiently  represent 
the  content  of  the  video  data.  The  NAL  achieves  the  net- 
work-friendly  objective  of  H.264/AVC.  It  defines  the  interface 
between  the  VCL  and  the  broad  variety  of  systems  and  trans¬ 
port  media  [10].  All  data  are  encapsulated  in  NAL  units  which 
contain  an  integer  number  of  bytes. 

C.  Channel  Coding 

In  this  work,  we  use  rate  compatible  punctured  convolutional 
(RCPC)  codes  for  channel  coding  [11].  Using  RCPC  codes  al¬ 
lows  us  to  utilize  Viterbi’s  upper  bounds  on  the  bit  error  proba¬ 
bility,  Pf„  given  by 

1  OO 

Ph  <  p  ^  c-dPd  (3) 


where  P  is  the  period  of  the  code,  dfree  is  the  free  distance  of 
the  code,  Cd  is  the  information  error  weight,  and  Pd  is  the  prob¬ 
ability  that  the  wrong  path  at  distance  d  is  selected.  An  AW GN 
channel  with  binary  phase-shift  keying  (BPSK)  modulation  has 
a  Pd  given  by 


Ek 

io 


(1) 


where  Ek  is  the  energy-per-bit,  J0/2  is  the  two-sided  noise 
power  spectral  density  due  to  MAI  in  Watts/Hertz,  Sj,  is  the 
power  level  of  the  node-of-interest  in  Watts,  7?/,.  is  the  trans¬ 
mitted  bit  rate  in  bits  per  second,  Sj  is  the  power  level  of  inter¬ 
fering  node  j  in  Watts,  and  Wt  is  the  total  bandwidth  in  Hertz 
[8].  The  term  “power  level”  refers  to  the  power  that  is  received 
by  the  centralized  control  unit.  For  a  given  power  level,  nodes 
can  determine  the  required  transmit  power  using  a  propagation 
model.  /?/..  is  taken  to  be  the  total  bit  rate  used  for  source  and 
channel  coding.  Assuming  K  users,  Ilf,  can  be  expressed  as 

Rk  =  ^-k  =  1,2, 3,..., if  (2) 

-£i'C,k 

where  Rsj~  is  the  source  coding  rate  for  node  k  and  Rrj,.  is  the 
channel  coding  rate  for  node  k.  Since  /?*/,  has  units  of  bits  per 


where  Rc  is  the  channel  coding  rate  and  Eb / Io  is  the  energy- 
per-bit  to  multiple-access-interference  ratio,  measured  in  Watts/ 
Hertz. 

D.  Expected  Video  Distortion  Calculation 

Clearly,  the  expected  video  distortion  for  a  node  should  de¬ 
pend  on  the  corresponding  bit  error  rate.  In  this  work,  we  uti¬ 
lize  universal  rate-distortion  characteristics  (URDCs)  [9].  These 
characteristics  show  the  expected  distortion  as  a  function  of 
the  bit  error  probabilities,  I),  after  channel  decoding.  However, 
since  video  encoded  with  the  H.264  codec  is  designed  to  handle 
packet  errors  as  opposed  to  bit  errors,  we  need  to  calculate  the 
resulting  packet  loss  rate.  We  calculate  a  real-time  transport  pro¬ 
tocol  (RTP)  packet  loss  rate  (PLR)  from  a  certain  bit  error  rate 
(BER),  drop  packets  from  the  H.264  bitstream  according  to  the 
RTP  PLR,  and  pass  the  corrupted  H.264  bitstream  to  the  H.264 
decoder  to  calculate  the  distortion  of  the  uncompressed  video. 

The  link  layer  packet  size  is  LLsize  (measured  in  bits).  Thus, 
the  link  layer  PLR  is  PLRll  =  1  —  (1  —  BER)LLsi~b ,  where 
PLRll  is  the  packet  loss  rate  for  a  link  layer  packet  of  size, 
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LLsize.  Similarly,  we  calculate  the  RTP  packet  loss  rate  with 
PLRrtp  =  1  —  (1  —  PLR,Li)RTPsi» ,  where  PLRrtp  is  the 
packet  loss  rate  for  an  RTP  packet  of  size,  RTPsize  (measured 
in  the  number  of  link  layer  packets).  The  RTP  provides  a  packet 
format  for  real-time  data  transmissions  [12].  We  assume  that 
we  know  when  a  packet  has  an  error,  and  we  manually  drop 
packets  with  any  errors  from  the  H.264  encoded  video  stream, 
in  accordance  with  the  PLRrtp  calculated  from  the  BER.  We 
then  calculate  the  distortion  of  this  “corrupted”  video  stream. 
This  creates  the  relation  between  the  BER  and  the  distortion  of 
a  packet-based  video  stream  with  packet  errors.  As  mentioned 
previously,  the  bit  error  rate  we  are  interested  in  for  the  URDCs 
is  the  bit  error  rate  after  channel  decoding.  Thus,  in  our  case, 
Pfc  =  BER. 

Since  channel  errors  are  random,  the  video  distortion  Ds+c,k 
of  node  k,  which  is  due  to  both  the  lossy  compression  and 
channel  errors,  is  a  random  variable.  Thus,  it  does  not  suffice 
to  calculate  the  video  distortion  for  just  one  realization  of  the 
channel.  Instead,  we  will  consider  the  expected  value  of  the  dis¬ 
tortion,  E[DS+C,k\- 

As  in  [13]  and  [14],  we  assume  the  following  model  for  the 
URDC  for  each  user  k: 


E[D s-fc,fc]  —  a 


logio 


(5) 


where  a  and  b  are  such  that  the  square  of  the  approximation  error 
is  minimized.  Thus,  instead  of  calculating  the  URDCs  based  on 
experimental  results  for  every  possible  Pf„  we  instead  experi¬ 
mentally  calculate  the  expected  distortion  for  a  few  packet  loss 
rates  associated  with  specific  bit  error  rates,  /Vs.  We  then  use 
the  model,  given  in  (5),  to  approximate  the  distortion  for  other 
bit  error  rates.  The  parameters  a  and  b  depend  on  the  video  se¬ 
quence  and  the  source  coding  rate. 

The  expected  distortion  E[DS+C,k\  for  node  A:  is  a  function 
of  the  source  and  channel  coding  rates  Rsj.  and  R,c,k,  for  node 
/,:,  and  the  power  levels  of  all  nodes  Sk,  k  =  1 .... .  K.  This  can 
be  seen  as  follows.  From  (1),  assuming  that  all  users  transmit  at 
the  same  total  bit  rate  (and  thus  chip  rate),  the  Ek/Io  for  node  k 
is  a  function  of  the  power  levels  of  all  nodes.  Parameters  Cd  and 
dfree  depend  on  the  channel  coding  rate.  Thus,  from  (3)  and 
(4),  it  follows  that  the  bit  error  probability  I),  for  node  k  is  a 
function  of  Ek/Io  and  the  channel  coding  rate  Rc,k-  Parameters 
a  and  b  depend  on  the  source  coding  rate  and  the  encoded  video 
sequence.  Therefore,  from  (5),  it  follows  that  E[DS+C^]  is  a 
function  of  the  bit  error  probability  f),.  the  source  coding  rate 
Rs,k,  and  the  encoded  video  sequence.  Thus,  to  summarize,  we 
can  write  the  expected  distortion  as  E[Da+Cjk](Rs,k,  Rc,k,S_ ). 


III.  Optimal  Resource  Allocation 


A.  Problem  Formulation 

A  centralized  control  unit  at  the  network  layer  determines 
how  network  resources  should  be  allocated  amongst  the  nodes. 
It  can  request  changes  in  transmission  parameters,  such  as  the 
source  coding  rates,  channel  coding  rates,  and  power  levels. 
There  are  two  criteria  we  will  utilize  to  optimally  allocate  the 
network  resources  to  each  node  in  the  network.  The  constraint 
for  both  criteria  is  that  the  chip  rate  be  the  same  for  all  nodes. 


Assuming  that  the  spreading  code  length  is  the  same  for  all 
nodes,  a  constraint  on  the  chip  rate  corresponds  to  a  constraint 
on  the  transmission  bit  rate  Rk .  Thus,  we  can  equivalently  im¬ 
pose  a  constraint  on  the  bit  rate  instead  of  the  chip  rate.  The 
first  criterion  we  will  employ  can  be  formally  stated  as  fol¬ 
lows:  Given  a  total  target  bit  rate,  Rbudget ,  determine  the  vectors 
of  optimal  source  coding  rates  Rs*,  channel  coding  rates  /?,.*, 
and  power  levels  S_*  such  that  the  overall  end-to-end  distortion 
Dave(Rs ,  Rc,  S)  over  all  nodes  is  minimized: 

{Rs* ,Rc*,S*}  =  arg  min  /),,,*(//,.  Rc,  S) 

Rs  ,RC,S_ 

subject  to  7?1  =  7?2  =  •  •  •  =  UK  =  Rbudget  (6) 

with  Rk  —  ^Rs,k/Rc,k)i  and  Dave ( Rs i  Rc i  eL)  — 
(l/K)Y:k=iE[Ds+c,k\(Rs,k,Rc,k,S). 

The  second  criterion  we  will  use  to  allocate  resources  to  the 
nodes  in  the  network  minimizes  the  maximum  distortion: 


{Rs*,  Rc*,  S*  }  =  arg  min  {max E[DS+C,k ] ( Rs,k , 

Jis  R 

subject  to  i?i  =  R2  =  ■  ■  ■  =  Rk  =  Rbudget 


Rc,k,  s)} 


(7) 


with  Rk  =  ( Rs.k/Rc.k )•  This  formulation  assumes  that  the 
videos  from  all  sensors  are  equally  important,  but  allows  sen¬ 
sors  that  image  low-motion  scenes  to  use  a  lower  source  coding 
rate.  This  criterion  guarantees  fairness  among  all  sensors,  since 
we  are  minimizing  the  worst  distortion  among  all  sensors.  The 
problem  is  a  discrete  optimization  problem,  that  is,  Rs,k,  Rc.k, 
and  Sk  can  only  take  values  from  discrete  sets  Rs,  Rc,  and  S, 
respectively,  i.e„  Rs,k  €  Rs,  Rc,k  <E  Rc,  Sk  €  S  [9], 

We  assume  that  the  K  nodes  are  grouped  into  N  motion 
classes  according  to  the  amount  of  motion  in  the  scenes  they 
are  imaging.  For  example,  if  N  =  2,  we  can  have  two  classes 
of  nodes,  low-motion  nodes  and  high-motion  nodes.  We  assume 
that  each  class  has  its  own  set  of  URDC  curves  (5).  Thus,  instead 
of  determining  the  source  coding  rate,  channel  coding  rate,  and 
power  for  each  node,  we  just  determine  these  parameters  for 
each  class. 


B.  Optimization  Algorithm  and  Computational  Complexity 

We  next  discuss  our  proposed  discrete  optimization  algorithm 
and  its  computational  complexity.  Given  that  for  all  admissible 
(Rs,k,  Rc,k )  pairs,  we  should  have  (Rs,k/Rc,k)  =  Rbudget-  the 
cardinalities  |RS|  and  |RC|  should  be  equal,  i.e.,  |RS|  =  |RC|  = 

C.  The  number  of  admissible  (RSjk,  Rc.k )  pairs  should  also  be 
equal  to  C.  For  each  class  of  nodes,  the  source  coding  rate- 
channel  coding  rate  pair  and  the  power  level  should  be  selected. 
The  number  of  possible  choices  for  each  class  of  nodes  is  C  ■ 
|S|,  where  |S|  is  the  cardinality  of  set  S.  If  there  are  N  motion 
classes,  the  total  number  of  admissible  combinations  of  source 
coding  rate,  channel  coding  rate,  and  power  level  for  each  class 
of  nodes  is  ( C  ■  |S|)A  . 

The  problems  of  (6)  and  (7)  could  be  solved  using  exhaustive 
search,  by  trying  out  all  ( C  ■  |S|)A  combinations  and  selecting 
the  one  that  minimizes  the  corresponding  expression.  However, 
the  computational  complexity  can  be  reduced  by  noting  that 
E[Ds+c,k](Rs,k:  Rc,ki  S.)  f°r  node  k  is  not  affected  by  the 
choices  of  source  coding  rates  and  channel  coding  rates  of  the 
other  users.  It  is  only  affected  by  the  power  selections  of  the 
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TABLE  I 

MAD  With  Various  Distributions  of  Node  Types:  Target  bit  rate  =  144  000  bits/s 


Low 

(Rsi,Rd,  Si) 

Ds+c,! 

High 

(Rs2,Rc2,  S2) 

Ds+c,  2 

Dave 

PSNRaVe 

90 

(72  k,  1/2, 10) 

7.8 

10 

(96 k,  2/3, 15) 

19.6 

9.0 

38.6  dB 

70 

(72fc,  1/2, 10) 

9.0 

30 

(96fc,  2/3, 15) 

21.0 

12.6 

37.1  dB 

50 

(48fc,  1/3, 5) 

11.9 

50 

(96fc,  2/3, 10) 

20.6 

16.3 

36.0  dB 

30 

(48  A;,  1/3,5) 

13.8 

70 

(96fc,  2/3, 10) 

23.3 

20.4 

35.0  dB 

10 

(48fc,  1/3,5) 

16.3 

90 

(96fe, 2/3, 10) 

26.9 

25.8 

34.0  dB 

TABLE  II 

MAD  With  Various  Distributions  of  Node  Types:  Target  bit  rate  =  96  000  bits/s 


Low 

(Rs  1,  Rci,  Si) 

Os+d 

High 

(Rs 2,  Rc2,  S2) 

Ds+c, 2 

Dave 

PS  N  Rave 

90 

(48fe,  1/2, 5) 

7.9 

10 

(64 k,  2/3, 15) 

23.5 

9.5 

38.4  dB 

70 

(48 Ac,  1/2,  5) 

8.5 

30 

(64k,  2/3, 10) 

31.7 

15.4 

36.2  dB 

50 

(48fc,  1/2,5) 

9.7 

50 

(64fc,  2/3, 10) 

35.1 

22.4 

34.6  dB 

30 

(48fc,  1/2, 5) 

11.3 

70 

(64k,  2/3, 10) 

38.8 

30.5 

33.3dB 

10 

(48fe,  1/2,5) 

13.3 

90 

(64 k,  2/3, 10) 

43.0 

40.0 

32.1  dB 

other  users.  There  are  |S|A  possible  power  allocations  among 
the  classes  of  nodes.  For  each  power  allocation,  each  class  of 
nodes  should  select  the  best  (Rs^,  Rc,k)  pair  (the  one  that 
minimizes  the  expected  distortion).  Since  there  are  C  such 
pairs,  (7  —  1  comparisons  will  be  needed.  In  order  to  do  that  for 
all  classes  of  nodes,  the  total  number  of  comparisons  will  be 
N(C  —  1)|S|A.  Thus,  for  each  of  the  |S|A  power  allocations 
S_,  we  have  found  the  source-channel  coding  rate  combinations 
that  would  minimize  the  expected  distortion  for  each  class  of 
nodes. 

Thus,  to  solve  the  problem  of  (6),  we  need  to  find  the  min¬ 
imum  of  | S | A  numbers.  For  that,  we  need  |S|  A  —  1  comparisons. 
To  summarize,  we  need  a  total  of  N(C  —  1 ) | S | A  +  |S|A  —  1 
comparisons  to  solve  the  problem  of  (6). 

In  order  to  solve  the  problem  of  (7),  for  each  of  the  |S|A 
power  combinations,  we  need  to  compare  the  distortions  for 
each  class  of  nodes  and  find  the  maximum  distortion  among  the 
node  classes.  For  this,  we  will  need  N  —  1  comparisons.  After 
we  do  that,  we  need  to  find  the  minimum  of  these  values  among 
the  | S | A  combinations.  So,  we  need  a  total  of  N(C  —  1)|S|A  + 
( N  —  1)  +  | S | A  —  1  comparisons  in  order  to  solve  the  problem 
of  (7). 


IV.  Experimental  Results 

We  next  provide  experimental  results  using  software  simu¬ 
lations.  We  perform  the  optimization  procedure  discussed  in 
Section  III  using  the  proposed  model  for  URDCs.  The  data 
points  used  to  obtain  the  parameters  a  and  b  in  (5)  are  obtained 
by  corrupting  the  video  stream  with  packet  errors  based  on  a 
calculated  Pf„  decoding  the  corrupted  video  bit  stream  with  the 
H.264/AVC  codec,  calculating  the  distortion,  repeating  this  ex¬ 
periment  300  times,  and  then  taking  the  average  distortion.  We 
assume  that  there  are  two  possible  motion  levels  viewed  by  the 
sensor  nodes,  low  motion  and  high  motion.  Thus,  there  are  two 
node  classes  (/V  =  2).  The  “Akiyo”  sequence  is  used  to  repre¬ 
sent  a  low-motion  node,  and  the  “Foreman”  sequence  is  used  to 
represent  a  high-motion  node.  It  is  necessary  to  have  two  sets  of 


URDC  curves,  one  for  each  level  of  motion.  The  characteristics 
were  obtained  for  both  video  sequences  at  a  frame  rate  of  15  f/s. 

We  use  BPSK  modulation  and  RCPC  codes  with  mother  code 
rate  1/4  from  [11]  for  channel  coding.  We  set  the  link  layer 
packet  size,  LLsize,  to  400  bits.  We  examine  target  bit  rate  con¬ 
straints  at  144  000  bits/s  and  96  000  bits/s.  The  total  bandwidth, 
Wt,  was  set  to  20  MHz.  The  set  of  admissible  source  coding 
rates  and  corresponding  channel  coding  rates  for  the  different 
target  bit  rates  are 


R  =  144  000  —  -4  Rs, 
s 

Rc  G  if 48  kbps,  If V  ( 72  kbps,  f- 


72  =  96  000  —  — +  Rs, 

s 

Rc  G  if  32  kbps,  IV  (48  kbps,  f- 


^64  kbps 


(9) 


The  power  levels  in  Watts  were  chosen  from 
S  G  {5, 10, 15}  Watts.  Thus,  (7  =  3  and  |S|  =  3. 

In  Tables  I-VIII,  we  show  how  the  network  resources 
should  be  assigned  for  various  distributions  of  the  two 
types  of  nodes  for  different  target  bit  rates.  The  low-mo¬ 
tion  nodes’  source  coding  rate  in  bits  per  second,  channel 
coding  rate,  and  power  level  in  Watts  are  represented 
by  Rsi,  Rc i,  and  S},  respectively,  and  the  high-motion 
nodes’  parameters  are  represented  by  Rs2,  Rc2,  and  S2. 
The  number  of  low-motion  nodes  is  given  under  column, 
“Low”,  and  the  number  of  high-motion  nodes  is  given  under 
column,  “High”.  “MAD”  corresponds  to  the  method  of 
Minimizing  the  Average  end-to-end  Distortion  over  all  users, 
and  “MMD”  corresponds  to  the  technique  of  Minimizing 
the  Maximum  Distortion.  In  Tables  I- VI,  the  distribution  of 
the  two  types  of  nodes  is  vailed  while  the  total  number  of 
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TABLE  III 

MMD  With  Various  Distributions  of  Node  Types:  Target  bit  rate  =  144  000  bits/s 


Low 

(Rsi,  Rd,  Si) 

ds+cA 

High 

(Rs 2,  Rc 2,  S2) 

Ds+c,  2 

Dave 

P  S  N  RaVe 

90 

(48k,  1/3,5) 

9.6 

10 

(96k,  2/3, 15) 

14.5 

10.1 

38. IdS 

70 

(48fc,  1/3,5) 

12.8 

30 

(961c,  2/3, 15) 

16.5 

13.9 

36.7  dB 

50 

(48fe,  1/3, 5) 

17.9 

50 

(96fc,  2/3, 15) 

18.9 

18.4 

35.5  dB 

30 

(48fe,  1/3, 5) 

13.8 

70 

(96fc,  2/3, 10) 

23.3 

20.4 

35.0  dB 

10 

(48fc,  1/3,5) 

16.3 

90 

(96fc,  2/3, 10) 

26.9 

25.8 

34MB 

TABLE  IV 

MMD  With  Various  Distributions  of  Node  Types:  Target  bit  rate  =  96  000  bits/s 


Low 

(Rsi,  RcU  Si) 

Ds-\-c,  1 

High 

(Rs2,  Rc 2,  S2) 

Ds+c,  2 

Dave 

PS  N  Rave 

90 

(48k,  1/2,5) 

7.9 

10 

(64fc,  2/3, 15) 

23.5 

9.5 

38.4  d.B 

70 

(48fc,  1/2, 5) 

10.4 

30 

(641c,  2/3, 15) 

27.9 

15.7 

36.2  dB 

50 

(481c,  1/2,5) 

14.6 

50 

(64fe,  2/3, 15) 

32.2 

23.4 

34.4  dB 

30 

(32 k,  1/3,  5) 

21.1 

70 

(64A;,  2/3, 15) 

36.9 

32.1 

33.  IdS 

10 

(321c,  1/3,  5) 

25.7 

90 

(64k,  2/3, 15) 

42.2 

40.6 

32MB 

TABLE  V 

MAD  With  Equal  Distributions  of  Node  Types:  Target  bit  rate  =  144000  bits/s 


Low 

(Rsi,  Rci,Si) 

Ds-\-c,  1 

High 

(Rs2,Rc2,  S2 ) 

Ds+c, 2 

Dave 

PSNRave 

10 

(96 k,  2/3, 15) 

1.8 

10 

(96 k,  2/3, 15) 

12.1 

6.9 

39.7  dB 

30 

(961c,  2/3, 10) 

4.7 

30 

(96fc,  2/3, 15) 

16.0 

10.4 

38MB 

50 

(481c,  1/3, 5) 

11.9 

50 

(96k,  2/3, 10) 

20.6 

16.3 

36MB 

70 

(481c,  1/3,  5) 

20.0 

70 

(96 k,  2/3, 10) 

32.8 

26.4 

33.9  dS 

90 

(48  k,  1/3, 10) 

24.5 

90 

(48  k,  1/3, 15) 

68.3 

46.4 

31.5dS 

TABLE  VI 

MAD  With  Equal  Distributions  of  Node  Types:  Target  bit  rate  =  96  000  bits/s 


Low 

(Rsi,  Rd,Si) 

Ds+c,  1 

High 

(Rs2,Rc2,  S2) 

Ds+c, 2 

Dave 

PSNRave 

10 

(64 k,  2/3, 15) 

3.0 

10 

(64 k,  2/3, 15) 

22.4 

12.7 

37.  Ids 

30 

(481c,  1/2,  5) 

7.9 

30 

(641c,  2/3, 15) 

23.5 

15.7 

36.2  dB 

50 

(481c,  1/2,  5) 

9.7 

50 

(641c,  2/3, 10) 

35.1 

22.4 

34.6  dB 

70 

(48  k,  1/2, 10) 

11.7 

70 

(481c,  1/2, 15) 

49.7 

30.7 

33.3  dB 

90 

(32fc,  1/3, 10) 

19.8 

90 

(481c,  1/2, 15) 

58.9 

39.3 

32.2  dS 

TABLE  VII 

MMD  With  Equal  Distributions  of  Node  Types:  Target  bit  rate  =  144000  bits/s 


Low 

(Rsi,  Rd,  Si) 

Ds-\-c,  1 

High 

(Rs 2,  Rc2,  S2) 

Ds+c,  2 

Dave 

PSNRave 

10 

(72k,  1/2, 15) 

1.8 

10 

(96 k,  2/3, 15) 

12.1 

6.9 

39.7  dB 

30 

(481c,  1/3,  5) 

9.6 

30 

(961:,  2/3, 15) 

14.5 

12.1 

37.3  dB 

50 

(481c,  1/3,  5) 

17.9 

50 

(961c,  2/3, 15) 

18.9 

18.4 

36MB 

70 

(481c,  1/3,  5) 

20.0 

70 

(961c,  2/3, 10) 

32.8 

26.4 

33MB 

90 

(481c,  1/3, 10) 

24.5 

90 

(481c,  1/3, 15) 

68.3 

46.4 

31MB 

nodes  is  kept  constant.  Tables  I  and  II  use  the  MAD  crite¬ 
rion  while  Tables  III  and  IV  utilize  the  MMD  criterion.  We 
give  the  resulting  average  end-to-end  peak  signal-to-noise 
ratio  (PSNR)  in  dB  for  the  entire  network  as  a  measure 
of  performance  for  the  MAD  experiments.  We  also  use  the 


minimum  PSNR  as  a  measure  of  performance  for  the  MMD 
experiments.  The  PSNR  is  calculated  from  the  expected 
distortion  PSNR  =  10 log  (255 2/E\Ds+c])  where  PSNR  is 
the  peak  signal-to-noise  ratio  and  E  [Ds_|_c]  is  die  expected 
distortion  due  to  source  coding  and  channel  errors. 
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TABLE  VIII 

MMD  With  Equal  Distributions  of  Node  Types:  Target  bit  rate  =  96  000  bits/s 


Low 

(Rsi,  Rci,Si) 

Ds-\-c,  l 

High 

(Rs 2:  Rc2,  S2) 

Ds+c,  2 

Dave 

PS  N  Rave 

10 

(48fc,  1/2, 15) 

3.0 

10 

(48  k,  1/2, 15) 

22.4 

12.7 

37.1  dB 

30 

(48fc,  1/2,5) 

7.9 

30 

(64fc,  2/3, 15) 

23.5 

15.7 

36.2  dB 

50 

(48fc,  1/2, 5) 

14.6 

50 

(64 k,  2/3, 15) 

32.2 

23.4 

34.445 

70 

(32  fc,  1/3, 5) 

25.7 

70 

{64k,  2/3, 15) 

42.2 

34.0 

32.8  dB 

90 

(S2k,  1/3, 5) 

51.5 

90 

(48  k,  1/2, 15) 

50.5 

51.0 

31.1  dB 

From  the  discussion  of  Section  III-B,  we  can  see  that 
we  need  44  comparisons  for  MAD  and  45  comparisons  for 
MMD. 

As  expected,  the  PSNR  decreases  as  you  move  down  the 
MAD  tables  because  the  number  of  high-motion  nodes  are 
increasing.  When  using  MMD,  we  observe  how  the  value  of 
the  average  PSNR  can  actually  increase  at  some  points  as 
you  move  down  the  table.  This  occurs  when  the  maximum 
distortion  switches  from  being  that  of  the  high-motion  node  to 
that  of  the  low-motion  node  and  vice  versa.  We  see  that  in  most 
MAD  cases,  high-motion  nodes  are  assigned  a  higher  source 
coding  rate  than  the  low-motion  nodes.  This  is  because  the  drop 
in  the  end-to-end  distortion  when  increasing  the  source  coding 
rate  for  a  high-motion  video  sequence  is  more  significant  than 
the  effect  of  employing  stronger  channel  coding.  However,  the 
distortions  for  the  low-motion  video  sequence  remain  relatively 
low,  even  when  the  source  coding  rate  is  decreased,  so  it  can 
afford  to  transmit  at  a  lower  source  coding  rate  in  some  cases. 

V.  Conclusions 

In  this  paper,  we  presented  a  cross-layer  optimization  algo¬ 
rithm  that  works  across  the  physical  layer,  the  data  link  layer, 
and  the  application  layer  in  a  wireless  visual  sensor  network. 
This  algorithm  accounts  for  network  performances  all  the  way 
from  the  physical  layer  up  to  the  application  layer.  At  the  appli¬ 
cation  layer,  we  determined  the  source  coding  rate,  Rs,  for  video 
compression.  At  the  data  link  layer,  we  assigned  the  channel 
coding  rate,  Rc.  At  the  physical  layer,  we  selected  the  power 
level,  S.  The  algorithm  shows  how  to  distribute  these  parameters 
among  all  the  nodes  transmitting  in  the  network.  To  create  a  re¬ 
alistic  DS-CDMA  visual  sensor  network,  different  levels  of  mo¬ 
tion  were  assumed  to  be  imaged  by  the  nodes.  By  utilizing  the 
parametric  model  for  the  URDCs,  we  estimated  each  node's  ex¬ 
pected  distortion  in  a  computationally  efficient  manner.  We  pre¬ 
sented  the  combinations  of  {RS1 R,  •  -S'}  for  each  node  that  re¬ 
sult  in  the  minimal  average  end-to-end  distortion  over  all  nodes 
in  the  system  and  the  combinations  that  minimize  the  maximum 
distortion.  We  also  showed  how  to  determine  the  minimum  total 
bandwidth  needed  to  obtain  a  specific  level  of  quality  for  the  de¬ 
sired  number  of  nodes  and  which  target  chip  rate  achieves  the 
highest  average  PSNR  for  a  given  total  bandwidth.  Our  exper¬ 
imental  results  demonstrated  the  effectiveness  of  the  proposed 
cross-layer  optimization  scheme. 
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